The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 data for US
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
Importing the dataset which contains information about the number of Covid 19 cases diagnosed per day and the number of deaths reported on the coresponding date
df_cases_death = pd.read_csv("/Users/SDe11/Documents/Projects/Covid19 Visualizations/us_cases_deaths.csv")
Sneakpeak at the Sample data
df_cases_death.head()
| date | cases | deaths | |
|---|---|---|---|
| 0 | 2020-01-21 | 1 | 0 |
| 1 | 2020-01-22 | 1 | 0 |
| 2 | 2020-01-23 | 1 | 0 |
| 3 | 2020-01-24 | 2 | 0 |
| 4 | 2020-01-25 | 3 | 0 |
# Fetching the "cases" and "death" column for plotting
date_plot = df_cases_death["date"]
cases_plot = df_cases_death["cases"]
death_plot = df_cases_death["deaths"]
Cumulative distribution function indicates the spread of the disease inspite of various measures like lockdowns which were in place to curb down the impact of the virus As of June 2021, a total of 33465562 people in US are diagonised as Covid positive.
plt.title('Cumulative distribution function of number of cases in US')
ax = plt.gca()
maximum_cases = max(cases_plot)
last_date = max(date_plot)
ax.axes.yaxis.set_visible(False)
ax.axes.xaxis.set_visible(False)
plt.annotate(maximum_cases, xy=( last_date, maximum_cases))
plt.plot(date_plot, cases_plot )
plt.show()
Covid19 proved to be fatal special for the elderly population with underlying health conditions. The graph plots the number of people died of Covid in US
plt.title('Cumulative distribution function of number of deaths in US')
ax = plt.gca()
maximum_deaths = max(death_plot)
last_date = max(date_plot)
ax.axes.yaxis.set_visible(False)
ax.axes.xaxis.set_visible(False)
plt.annotate(maximum_deaths, xy=( last_date, maximum_deaths))
plt.plot(date_plot, death_plot)
plt.show()
A stacked area chart is typically used when one wants to track not only the total value, but also want to understand the breakdown of that total by groups. Overhere, a stacked area chart has been used to compare the number of positive cases with that of fatal cases.
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(
x=date_plot, y=death_plot,
mode='lines',
name='Fatal Cases',
line=dict(width=0.5, color='#BE2625'),
stackgroup='one',
groupnorm='percent' # sets the normalization for the sum of the stackgroup
))
fig.add_trace(go.Scatter(
x=date_plot, y=cases_plot,
mode='lines',
name='Active Cases',
line=dict(width=0.5, color='#FFA500'),
stackgroup='one'
))
fig.update_layout(
showlegend=True,
xaxis_tickformat = '%d %B (%a)<br>%Y',
yaxis=dict(
type='linear',
range=[1, 100],
ticksuffix='%'))
fig.show()
#Confirmed Cases by States - The dataset provides statewide breakdown of the cases.
df_cases_states = pd.read_csv("/Users/SDe11/Documents/Projects/Covid19 Visualizations/us-states.csv")
Piechart for top 10 states with maximum deaths. This shows that NewYork has the maximum number of deaths
deaths_by_state = df_cases_states.groupby(['state'])['deaths'].sum()
top_10_states_d = deaths_by_state.sort_values(ascending=False).head(10)
c = ['lightcoral', 'rosybrown', 'sandybrown', 'navajowhite', 'gold',
'khaki', 'lightskyblue', 'turquoise', 'lightslategrey', 'thistle', 'pink']
plt.figure(figsize=(20,15))
plt.title('Covid-19 Confirmed Deaths per State', size=20)
plt.pie(top_10_states_d.values, colors=c,shadow=True, labels=top_10_states_d.values)
plt.legend(top_10_states_d.index, loc='best', fontsize=12)
plt.show()
Barchart for top 10 states with most number of cases shows that California has most number of positive cases compared to other states.
cases_by_state = df_cases_states.groupby(['state'])['cases'].sum()
top_10_states = cases_by_state.sort_values(ascending=False).head(10)
plt.bar(top_10_states.index, top_10_states.values, color=(0.2, 0.4, 0.6, 0.6),
width = 0.4)
plt.xticks(rotation = 45)
plt.title("Covid-19 Confirmed Cases per State")
plt.show()
A Choropleth Map is a map composed of colored polygons. It is used to represent spatial variations of a quantity. The following choropleth shows how badly the disease impacted different states across US. The graph shows that the states with the most populous cities are more severely impacted
us_state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
cases_by_state['state_abbr'] = cases_by_state['state'].astype(str).apply(lambda x: us_state_abbrev[x])
import plotly.express as px
fig = px.choropleth(cases_by_state, locations='state_abbr', color='cases',
color_continuous_scale="PuBu",
locationmode="USA-states",
range_color=(0, 200099982),
scope="usa",
labels={'state':'cases'}
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
The following graph depicts a time series analysis of the monthly spread of the disease across US Although the first diagnosis of cases in US dates back to early 2020, A major spike in numbers have been noted in the early months of 2021. There has been a sharp decrease in numbers starting April this year and this corresponds to when vaccines started rolling out in most of the cities
from pandas.core.common import SettingWithCopyWarning
import warnings
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)
series = df_cases_states[['date' , 'cases']]
series['date'] = pd.to_datetime(series['date'], format='%m/%d/%y' , errors='coerce')
series['cases'].groupby(series['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()
ax.axes.yaxis.set_visible(False)
plt.show()
Being a resident of the state of Oregon, I was particularly interested in knowning the graph of Oregon compared to the whole of US. Similar curve has been noted for Oregon with the peak appearning in May 2021.
series_oregon = df_cases_states[df_cases_states['state'] == 'Oregon']
series_oregon= series_oregon[['date' , 'cases']]
series_oregon['date'] = pd.to_datetime(series_oregon['date'], format='%m/%d/%y' , errors='coerce')
series_oregon = series_oregon[series_oregon['date'] > '01/01/21']
series_oregon['cases'].groupby(series_oregon['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()
ax.axes.yaxis.set_visible(False)
plt.show()
series_death = df_cases_states[['date' , 'deaths']]
series_death['date'] = pd.to_datetime(series_death['date'], format='%m/%d/%y' , errors='coerce')
series_death['deaths'].groupby(series_death['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()
ax.axes.yaxis.set_visible(False)
plt.show()
series_oregon = df_cases_states[df_cases_states['state'] == 'Oregon']
series_oregon_death= series_oregon[['date' , 'deaths']]
series_oregon_death['date'] = pd.to_datetime(series_oregon_death['date'], format='%m/%d/%y' , errors='coerce')
series_oregon_death = series_oregon_death[series_oregon_death['date'] > '01/01/21']
series_oregon_death['deaths'].groupby(series_oregon_death['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()
ax.axes.yaxis.set_visible(False)
plt.show()
Vaccinations plays a major role in prevention of Covid 19 specially within the adult population. It has been noted that the number of positive cases went down significantly as more and more people got vaccinated.
#Vaccinations
df_vaccinations = pd.read_csv("/Users/SDe11/Documents/Projects/Covid19 Visualizations/UsStateVaccinations.csv")
df_vaccinations['date'] = pd.to_datetime(df_vaccinations['date'], format='%Y-%m-%d' , errors='coerce')
df_vaccinations['people_fully_vaccinated_per_hundred'].groupby(df_vaccinations['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()
#Vaccinations
df_vaccinations_oregon = df_vaccinations[df_vaccinations["location"] == "Oregon"]
df_vaccinations_oregon['people_fully_vaccinated_per_hundred'].groupby(df_vaccinations_oregon['date'].dt.to_period('M')).sum().plot(kind='line')
ax = plt.gca()